Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/manifest: narrow Overlaps to exclude RANGEDEL sentinel keys #1432

Merged
merged 1 commit into from
Jan 13, 2022

Conversation

jbowens
Copy link
Collaborator

@jbowens jbowens commented Dec 28, 2021

When the largest key in a sstable is a range tombstone, the sstable's largest
boundary is set to the range deletion sentinel with the tombstone's exclusive
end boundary and the maximum sequence number. The range deletion sentinel
serves as a marker, indicating that the file's bounds end immediately before
the first key with the sentinel's user key.

In many places such as growing compactions or determining atomic compaction
units, Pebble considers the sentinel as exclusive and avoids unnecessarily
including files that share the same user key but only as a sentinel's exclusive
end bounary.

The (*manifest.Version).Overlaps method did not share this logic and only
considered user keys. This method is used during compaction picking when
finding the initial compaction inputs, for determining in-use key ranges,
grandparent files, etc.

This change adapts this method to consider a file with a largest user key equal
to the search range's start user key nonoverlapping if the file's largest key
is a range deletion sentinel. Additionally, Overlaps now takes an exclusiveEnd
parameter, indicating whether the end user key provided to Overlaps should
similarly be treated as an exclusive bound.

This change is expected to reduce write amplification in the presence of range
deletions by avoiding unnecessarily pulling in files due to perceived overlap.

Additionally, some of these RangeDeletionSentinel comparisons are updated to
use a new (*InternalKey).IsExclusiveSentinel helper. With the introduction of
range keys, we will have additional exclusive end boundary key kinds.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jbowens jbowens changed the title internal/manifest: narrow Overlaps to exclude RANGEDEL sentinel keys. internal/manifest: narrow Overlaps to exclude RANGEDEL sentinel keys Dec 29, 2021
Copy link
Contributor

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! It's been a while since we caught an instance of Pebble not special-casing rangedel sentinel keys, but I'm not surprised there's one more.

Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @jbowens and @sumeerbhola)


internal/manifest/version.go, line 517 at r1 (raw file):

// The returned files are a subsequence of the input files, i.e., the ordering
// is not changed.
func (v *Version) Overlaps(level int, cmp Compare, start, end []byte) LevelSlice {

You're handling the f.Largest.UserKey == start && f.Largest.Trailer = RangeDelSentinel case correctly, but we can do better and also handle the end == RangeDelSentinel && f.Smallest.UserKey == end case. But since end is a user key, you'd have to add an exclusiveEnd bool in the signature and make the relevant signature changes everywhere.

What do you think? I think that'd be a good idea to also fix while we're addressing this, but if you feel otherwise, I'm good with :lgtm: -ing this as-is.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Reviewed 3 of 4 files at r1.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @itsbilal and @jbowens)


internal/manifest/version.go, line 517 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

You're handling the f.Largest.UserKey == start && f.Largest.Trailer = RangeDelSentinel case correctly, but we can do better and also handle the end == RangeDelSentinel && f.Smallest.UserKey == end case. But since end is a user key, you'd have to add an exclusiveEnd bool in the signature and make the relevant signature changes everywhere.

What do you think? I think that'd be a good idea to also fix while we're addressing this, but if you feel otherwise, I'm good with :lgtm: -ing this as-is.

+1 to handling the end being exclusive case. There seem to be enough callers who have an [InternalKey, InternalKey] and we are losing information by passing in only the UserKey.


testdata/range_del, line 1269 at r1 (raw file):

----
1:
  000008:[a-c]

can this test be changed to print the InternalKey bounds. It will make it easier to see that the test cases are actually correct.

@jbowens jbowens force-pushed the overlaps-sentinel branch 7 times, most recently from 70c92e2 to e4fab74 Compare January 5, 2022 00:18
Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the exclusiveEnd parameter. I also added a (*InternalKey).IsExclusiveSentinel helper, because range keys will also have exclusive sentinel key kinds. Some of this logic will need to further change with range keys, for example, levelIter and mergingIter will only care if the largest key that affects points (eg, point key or range del) is a sentinel. I've deferred that for now.

Reviewable status: 0 of 23 files reviewed, 2 unresolved discussions (waiting on @itsbilal and @sumeerbhola)


internal/manifest/version.go, line 517 at r1 (raw file):

Previously, sumeerbhola wrote…

+1 to handling the end being exclusive case. There seem to be enough callers who have an [InternalKey, InternalKey] and we are losing information by passing in only the UserKey.

Done.


testdata/range_del, line 1269 at r1 (raw file):

Previously, sumeerbhola wrote…

can this test be changed to print the InternalKey bounds. It will make it easier to see that the test cases are actually correct.

Done.

Copy link
Contributor

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 6 of 17 files at r2, 17 of 17 files at r3, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @sumeerbhola)

Copy link
Contributor

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @sumeerbhola)

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 17 files at r2, 16 of 17 files at r3.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @jbowens and @sumeerbhola)


internal/manifest/version.go, line 275 at r3 (raw file):

	if !endIter.iter.valid() {
		endIter.Prev()
	} else if c := cmp(endIter.Current().Smallest.UserKey, end); c > 0 || c == 0 && exclusiveEnd {

nit: we don't need to do the comparison if !exclusiveEnd, so may as well avoid it.


internal/manifest/version_test.go, line 264 at r3 (raw file):

		// Level 2: empty.
		{2, "a", "z", false, ""},
	}

I'm being paranoid here, but curious if coverprofile shows that we've covered all the code paths.

Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @sumeerbhola)


internal/manifest/version.go, line 275 at r3 (raw file):

Previously, sumeerbhola wrote…

nit: we don't need to do the comparison if !exclusiveEnd, so may as well avoid it.

I might be misunderstanding, but if !exclusiveEnd the above for loop might've Nexted to a file with Smallest.UserKey > end, and we need the comparison to check for that.


internal/manifest/version_test.go, line 264 at r3 (raw file):

Previously, sumeerbhola wrote…

I'm being paranoid here, but curious if coverprofile shows that we've covered all the code paths.

Just checked and it does 👍

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 16 of 23 files reviewed, 3 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)


internal/manifest/version.go, line 275 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I might be misunderstanding, but if !exclusiveEnd the above for loop might've Nexted to a file with Smallest.UserKey > end, and we need the comparison to check for that.

I see. Could you add a tiny example in a code comment -- it will make future readers very happy :)


internal/manifest/version_test.go, line 264 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Just checked and it does 👍

Thanks

When the largest key in a sstable is a range tombstone, the sstable's largest
boundary is set to the range deletion sentinel with the tombstone's exclusive
end boundary and the maximum sequence number. The range deletion sentinel
serves as a marker, indicating that the file's bounds end immediately before
the first key with the sentinel's user key.

In many places such as growing compactions or determining atomic compaction
units, Pebble considers the sentinel as exclusive and avoids unnecessarily
including files that share the same user key but only as a sentinel's exclusive
end bounary.

The `(*manifest.Version).Overlaps` method did not share this logic and only
considered user keys. This method is used during compaction picking when
finding the initial compaction inputs, for determining in-use key ranges,
grandparent files, etc.

This change adapts this method to consider a file with a largest user key equal
to the search range's start user key nonoverlapping if the file's largest key
is a range deletion sentinel. Additionally, Overlaps now takes an exclusiveEnd
parameter, indicating whether the end user key provided to Overlaps should
similarly be treated as an exclusive bound.

This change is expected to reduce write amplification in the presence of range
deletions by avoiding unnecessarily pulling in files due to perceived overlap.

Additionally, some of these RangeDeletionSentinel comparisons are updated to
use a new (*InternalKey).IsExclusiveSentinel helper. With the introduction of
range keys, we will have additional exclusive end boundary key kinds.
Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Reviewable status: 15 of 23 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)


internal/manifest/version.go, line 275 at r3 (raw file):

Previously, sumeerbhola wrote…

I see. Could you add a tiny example in a code comment -- it will make future readers very happy :)

Done!

@jbowens jbowens merged commit d4f0a8d into cockroachdb:master Jan 13, 2022
@jbowens jbowens deleted the overlaps-sentinel branch January 13, 2022 02:41
jbowens added a commit to jbowens/pebble that referenced this pull request Jan 20, 2022
When Version.Overlaps is called for L0, the overlap window is
iteratively expanded until stable. In cockroachdb#1432, the Overlaps function was
adjusted to allow specifying that the end bound should be considered
exclusive. However, cockroachdb#1432 failed to update the exclusivity of the end
bound when the range widened. This improperly excluded files with
largest keys that exactly equaled the new widened end bound.

This commit also transforms the TestOverlaps test into a datadriven
test, introducing a few helpers for parsing the DebugString output of a
Version.

Fix cockroachdb#1459.
jbowens added a commit to jbowens/pebble that referenced this pull request Jan 21, 2022
When Version.Overlaps is called for L0, the overlap window is
iteratively expanded until stable. In cockroachdb#1432, the Overlaps function was
adjusted to allow specifying that the end bound should be considered
exclusive. However, cockroachdb#1432 failed to update the exclusivity of the end
bound when the range widened. This improperly excluded files with
largest keys that exactly equaled the new widened end bound.

This commit also transforms the TestOverlaps test into a datadriven
test, introducing a few helpers for parsing the DebugString output of a
Version.

Fix cockroachdb#1459.
jbowens added a commit that referenced this pull request Jan 21, 2022
When Version.Overlaps is called for L0, the overlap window is
iteratively expanded until stable. In #1432, the Overlaps function was
adjusted to allow specifying that the end bound should be considered
exclusive. However, #1432 failed to update the exclusivity of the end
bound when the range widened. This improperly excluded files with
largest keys that exactly equaled the new widened end bound.

This commit also transforms the TestOverlaps test into a datadriven
test, introducing a few helpers for parsing the DebugString output of a
Version.

Fix #1459.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants